Genbit Compress Tool(GBC): A Java-Based Tool to Compress DNA Sequences and Compute Compression Ratio(bits/base) of Genomes

نویسندگان

  • P. Raja Rajeswari
  • Allam Apparo
  • V. K. Kumar
چکیده

We present a Compression Tool , GenBit Compress”, for genetic sequences based on our new proposed “GenBit Compress Algorithm”. Our Tool achieves the best compression ratios for Entire Genome (DNA sequences) . Significantly better compression results show that GenBit compress algorithm is the best among the remaining Genome compression algorithms for non-repetitive DNA sequences in Genomes. The standard Compression algorithms such as gzip or compress cannot compress DNA sequences but only expand them in size. In this paper we consider the problem of DNA compression. It is well known that one of the main features of DNA Sequences is that they contain substrings which are duplicated except for a few random Mutations. For this reason most DNA compressors work by searching and encoding approximate repeats. We depart from this strategy by searching and encoding only exact repeats. our proposed algorithm achieves the best compression ratio for DNA sequences for larger genome. As long as 8 lakh characters can be given as input While achieving the best compression ratios for DNA sequences, our new GenBit Compress program significantly improves the running time of all previous DNA compressors. Assigning binary bits for fragments of DNA sequence is also a unique concept introduced in this program for the first time in DNA compression.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DNABIT Compress – Genome compression algorithm

Data compression is concerned with how information is organized in data. Efficient storage means removal of redundancy from the data being stored in the DNA molecule. Data compression algorithms remove redundancy and are used to understand biologically important molecules. We present a compression algorithm, "DNABIT Compress" for DNA sequences based on a novel algorithm of assigning binary bits...

متن کامل

GenomeCompress: A Novel Algorithm for DNA Compression

The genome of an organism contains all hereditary information encoded in DNA. So it is extremely important to sequence the genome which determines how the organisms survive, develop and multiply. Since three decades, due to massive efforts on DNA sequencing, complete genome sequence of a large number of organisms including humans are now known and the genomic databases are growing exponentially...

متن کامل

Biological sequence compression algorithms.

Today, more and more DNA sequences are becoming available. The information about DNA sequences are stored in molecular biology databases. The size and importance of these databases will be bigger and bigger in the future, therefore this information must be stored or communicated efficiently. Furthermore, sequence compression can be used to define similarities between biological sequences. The s...

متن کامل

ارائه روشی برای پیش‌پردازش تصویر جهت بهبود عملکرد JPEG

A lot of researchs have been performed in image compression and different methods have been proposed. Each of the existing methods presents different compression rates on various images. By identifing the effective parameters in a compression algorithm and strengthen them in the preprocessing stage, the compression rate of the algorithm can be improved. JPEG is one of the successful compression...

متن کامل

Encoding DNA sequences by integer chaos game representation

Motivation: DNA sequences are fundamental for encoding genetic information. The genetic information may be understood not only by symbolic sequences but also from the hidden signals inside the sequences. The symbolic sequences need to be transformed into numerical sequences so the hidden signals can be revealed by signal processing techniques. All current transformation methods encode DNA seque...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1006.1193  شماره 

صفحات  -

تاریخ انتشار 2010